# Few-shot, Interpolation-based Style-conditioned Text Generation using LLMs

## Downloading the data

You can download the processed datasets through this link: https://drive.google.com/drive/folders/1BczpIxuGDpJ7tkBMzPeoGYLQ0d-Kjng5?usp=drive_link

data_split is the combined Twitter and Gutenberg dataset.
data_split_reddit is the Reddit dataset.

In the following set, point path-to-text-folder to these folders.

## Installation

### 1. Install dependencies

```bash
pip install --no-index -r requirements.txt
```

### 2. Finetune Llama 2 to extract weight deltas

```bash
python finetune_llama.py path-to-texts-folder path-to-deltas-folder
```

### 3. Training the VAE

```bash
python pca_train_vae_llama.py path-to-texts-folder path-to-deltas-folder
```

### 4. Filtering the latent space

Run Filtering.ipynb to filter the latent space for the continuous subset and generate testing routines for the experiments.


### 5. Running interpolation experiments

```bash
python extract_final_results.py path-to-texts-folder test-corpus-name
```

### 5.5. (Optional) Run prompting Llama 2 experiment

```bash
python prompt_llama.py test-corpus-name
```

### 6. Analyze experiment results

Run Analysis.ipynb. Edit the cell parameters to generate results under different conditions. This notebook also includes the code to run the GPT-3.5 experiment.


NOTE: Some errors might arise due to output folders not existing. Please create the folders that the code is expecting and try again.
